ShuffleMapTask

ShuffleMapTask is a Task that writes records of a RDD partition to the shuffle system.

Caution

FIXME What RDD is ShuffleMapTask created for?

Note	Spark uses broadcast variables to send (serialized) tasks to executors.

Creating `ShuffleMapTask` Instance

Caution

FIXME

Writing Records From RDD Partition to Shuffle System — `runTask` Method

runTask(context: TaskContext): MapStatus

Note	`runTask` is a part of Task contract to…FIXME

runTask returns a MapStatus (with the location and estimated size of the result RDD block) after the records of the Partition were written to the shuffle system.

Internally, runTask uses the current closure Serializer to deserialize the taskBinary serialized task (into a pair of RDD and ShuffleDependency).

runTask measures the thread and CPU time for deserialization (using the System clock and JMX if supported) and stores it in _executorDeserializeTime and _executorDeserializeCpuTime attributes.

Note	`runTask` uses `SparkEnv` to access the current closure `Serializer`.

Note	The `taskBinary` serialized task is given when `ShuffleMapTask` is created.

runTask requests ShuffleManager for a ShuffleWriter (given the ShuffleHandle of the deserialized ShuffleDependency, RDD partition and input TaskContext).

Note	`runTask` uses `SparkEnv` to access the current `ShuffleManager`.

Note	The partition is given when `ShuffleMapTask` is created.

runTask computes the records in the RDD partition and writes them (to the shuffle system).

Note	This is the moment in `Task`'s lifecycle (and its RDD) when a RDD partition is computed and hence becomes a sequence of records (i.e. real data).

runTask stops the ShuffleWriter (with success flag enabled) and returns the MapStatus.

When the record writing was not successful, runTask stops the ShuffleWriter (with success flag disabled) and the exception is re-thrown.

You may also see the following DEBUG message in the logs when the ShuffleWriter could not be stopped.

DEBUG Could not stop writer

ShuffleMapTask

ShuffleMapTask

Creating `ShuffleMapTask` Instance

Writing Records From RDD Partition to Shuffle System — `runTask` Method

results matching ""

No results matching ""

ShuffleMapTask

Creating ShuffleMapTask Instance

Writing Records From RDD Partition to Shuffle System — runTask Method

results matching ""

No results matching ""

Creating `ShuffleMapTask` Instance

Writing Records From RDD Partition to Shuffle System — `runTask` Method